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ABSTRACT Data on board the future PLANCK Low Frequency Instrument (LFI), to mea- 
sure the Cosmic Microwave Background (CMB) anisotropics, consist of N differential temperature 
measurements, expanding a range of values we shall call R. Preliminary studies and telemetry 
allocation indicate the need of compressing these data by a ratio of C r ^ 10. Here we present 
a study of entropy for (correlated multi-Gaussian discrete) noise, showing how the optimal com- 
pression C r ^ pt, for a linearly discretized data set with Nbn s — log 2 N max bits is given by: 
C r — Nbit s / log 2 ('\/27re <7 e /A), where <7 e = (cfeiC) 1 / 2 ^ is some effective noise rms given 
by the covariance matrix C and A = R/N m ax ls the digital resolution. This A only needs to 
be as small as the instrumental white noise RMS: A ~ <7T — 2mK (the nominal p,K pixel 
sensitivity will only be achieved after averaging). Within the currently proposed Nbits = 16 rep- 
resentation, a linear analogue to digital converter (ADC) will allow the digital storage of a large 
dynamic range of differential temperature R = N max l^ accounting for possible instrument drifts 
and instabilities (which could be reduced by proper on-board calibration). A well calibrated signal 
will be dominated by thermal (white) noise in the instrument: tT e ~ Ot, which could yield large 
compression rates C r ^ pt — 8. This is the maximum lossless compression possible. In practice, 
point sources and 1// noise will produce tT e > <7t and C r ^ pt < 8. This strategy seems safer 
than non-linear ADC or data reduction schemes (which could also be used at some stage). 
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1. INTRODUCTION 

A compression rate of about c r ~ 10 is required on board the PLANCK Satellite 
LFI (see §2.1 below). The data rate could be reduced by accounting for the rel- 
ative significance of different bits (large and small temperature differences) in the 
analogue-to-digital converters ADC (Herreros et al. 1997). A further compression 
is assumed to be possible with classical lossless data compression techniques. 

Typically, standard lossless data compression techniques are applied successfully 
only to data sets with some redundancy. This redundancy can be formally expressed 
using the entropy per component (Shannon's entropy), h. A discretized data set can 
be represented by A^ its , which for a linear ADC is typically given by the maximum 
range N max : N bits = log 2 N max . If we express the joint probability for a set of N 
measurements as Pi lt ... t i N , we have that the Shannon entropy per component of the 
data set is: 
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Table 1. Parameters for the radiometers: a) central frequency (bandwidth is 20%); b) angular 
resolution (beam FWHM); c) the RMS thermal noise expected at 6.9 ms (144.9 Hz) sampling; 
d) range of temperatures expected from the sky (Jupiter, dipole, S-Z); e) number of detec- 
tors (2x horns); f) total data rate at 6.9 ms (2.5 arcmin); g) data rate for pixels of length 
FWHM/2.5 along the scanning circle. 



Shannon's theorem states that h is a lower bound to the average length of the code 
units. We will define the theoretical (optimal) compression rate as 

C r ,opt = 

For a uniform distribution of TV measurements we have pi = 1/N and h = log2N, 
which equals the number of bits per data. Thus: it is not possible to compress a 
(uniformly) random distribution of measurements. If noise is discretized to a high 
resolution (as compared to its variance) the resulting distribution of numbers ap- 
proaches a uniform distribution and it is therefore virtually impossible to compress. 
This indicates that, to first approximation, it seems difficult to produce a lossless 
algorithm for compression when the data is dominated by noise, but, as we shall 
see, the problem depends crucially on the digital resolution and the range of values 
to be stored. 

2. THE COMPRESSION PROBLEM 

2.1 Data Rate, Telemetry and compression 



Following the PLANCK LFI Scientific and Technical Plan (Part I, §6.3, Man- 
dolesi et al. 1998) the raw data rate of the LFI is r d ~ 260 Kb s^ 1 . This assumes: i) 
a sample frequency of 6.9 ms or f samp i = 144.9 Hz, which corresponds to 2.5 arcmin 
in the sky, 1/4 of the FWHM at 100 GHz, ii) N detec = 112 detectors: sky and refer- 
ence load temperature for 56 radiometers, iii) Nt, its = 16 bits data representation. 
Thus that the raw data rate is: 

Td = fsampi x N detec x N blts ^ 259.7 Kbs" 1 . (3) 



2 



The values for each channel are shown in Table 1. A factor of two reduction can be 
obtained by only transmitting the difference between sky and reference temperature. 
To allow for the recovery of diagnostic information on the separate stability of the 
amplifiers and loads, the full sky and reference channels of a single radiometer could 
be sent at a time (changing the selected radiometer from time to time to cover all 
channels). 

Note that the sampling resolution of 6.9 ms corresponds to 2.5 arcmin in the 
sky, which is smaller than the nominal FWHM resolution. Adjacent pixels in a 
circle could be averaged on-board to obtain the nominal resolution (along the circle 
direction). In this case the pixel size should still be at least ~ 2.5 smaller that the 
FWHM to allow for a proper map reconstruction. Note that each circle in the sky 
will be separated by 2.5' so even after this averaging along the circle scan there is 
still a lot of redundancy across circles. For pixels of size 9 ~ FWHM/2.5 along the 
circle scan the total scientific rate could be reduced to r ~ 67 Kbs -1 as shown in 
Table 1 (or 134 Kbs -1 with some subset information of the ref. load). 
The telemetry allocation for the LFI scientific data is expected to be r t — 20 Kb s _1 . 
Thus the target compression rates are about: 

c r = — ~ 3 - 13, (4) 
depending on the actual on-board processing and requirements. 

2.2 Data Structure 

Planck's satellite spins with a frequency f sp i n = 1 rpm so that the telescope (point- 
ing at approximately right angle to the spin axis) sweep out nearly great circles in 
the sky. Each circle is scanned over 1 (or 2) hours, so that there are 60 (or 120) 
images of the same pixel. Each measurement is mostly dominated by instrumental 
noise, ot ~ 2mK (see Table 1) rather than by the CMB noise (ctcmb — 10~ 2 mK). 
If this noise (at frequencies smaller than f sp i n ) is mostly thermal, there is no re- 
dundancy in these images and little hope for compression. But one could then say 
that in this case there is no need for compression, as we can just average those 60 
images of a pixel and only send the mean downwards to Earth. The problem is 
that one expects 1// noise to dominate the instrument noise at frequencies smaller 
than <~ 0.1 Hz. Thus, compression is only required when we want to keep these 60 
(120) images in order to correct for the instrument instability in the data reduction 
process (on Earth). This 1// type of noise is more redundant and might be sub- 
ject to some compression, but even in this case if we keep it to a high resolution 
(as compared to its rms) the resulting probabilities would be close to a uniform 
distribution and compression would be nearly impossible. 

2.3 Dynamic range & calibration 

The final dynamic range for the measured temperature differences per angular res- 
olution pixel will be AT ~ lfj,K — IK. The lowest resolution of ~ 1[mK will only be 
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obtained after averaging all data. The highest ~ IK being the hottest source that 
we want to keep (not saturated) by anyone of the frequencies. Positive signals from 
Jupiter, which will be used for calibration, can be as large as ~ 0.7K at 100 Ghz. 
Other point sources and the Galaxy give intermediate positive values. Negative 
differences (with respect to the mean CMB T ~ 2.7 K), of the order of a few mK, 
can be originated by the dipole, the relative velocity between the satellite velocity 
and the CMB rest frame. The Sunyaev-Zeldovich effect (towards a total of a few 
hundreds Clusters of Galaxies) can also give a negative signal of few lOmK. Thus 
the overall range of mesurements is — 30mK to IK. 

As pointed out by Herreros et al. (1997) the temperature resolution is given 
by the receiver noise ot on the sampling time 6.9 ms (or corresponding value if 
there is some on-board averaging) and not by the final target sensitivity. At the 
end of the mission, each FWHM pixel will have been measured ~ 10 6 times. Thus 
a lower resolution AT ~ lfiK is not necessary on board, given that the raw signal 
is dominated by the white noise component. This higher resolution will be later 
obtained by the pixel averaging (data reduction on Earth). 

We can distinguish two basic components for the receiver noise: the white or 
thermal noise, and the instabilities or calibration gains (like the 1// noise). An 
example is given by the following power spectrum of frequencies /: 

P(f) =A(l + %^ . (5) 



l/l 

The 'knee' frequency, fknee, is expected to be fknee — 0.005 Hz for a 4K load or 
fknee — 0.06 Hz for a 20K load. The expected RMS thermal noise, ctt oc A at the 
sampling frequency (2.5 armin), is listed in Table 1. The lowest value is given by the 
30 Ghz channel and could be further reduced to ~ lmK if the data is averaged to 
FWHM/2.5 to obtain the nominal resolution. The larger values in the dynamical 
range can be affected by the calibration gains. This is important and should be 
carefully taken into account if a non-linear ADC is used, as gains could then change 
the relative significance of measurements (eg, less significant bits shifting because 
of gains). In fact, a 1/ f power spectrum integrated from the knee-frequency (fknee 
for a time T, gives a diverging rms noise: 
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h = -r L - 1*™* df^ = * 2 T ^MTf max ) (6) 

J max J 1 IT J J max 



For a T ~ 1 year mission the contribution from the 1// noise in pixels averaged 
after succesive pointings f max ~ 10~ 4 and we have ~ 10 4 ct^! This illustrates 
why the calibration problem is so important and makes a large dynamic range 
desiderable. Averaging pixels at the spin rate, f max — fspin, gives a\ f ~ 10a T , 
this is not too bad for the dynamic range, but it corresponds to a mean value and 
there could be more important instantaneous or temporal gains. Drifts with periods 
longer than the spin period (1 rpm) can be removed by requiring that the average 
signal over each rotation at the same pointing remains constant. Drifts between 
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pointings (after 1 or 2 hours) could be reduced by using the overlaping pixels. All 
this can be easily done on-board, while a more careful matching is still possible (and 
necessary) on Earth. This allows the on-board gain to be calibrated on timescale 
larger than 1 min with an accuracy given by <tt- Additional and more carcfull 
in-flight calibration can also be done using the the signal from external planets and 
the CMB dipolc. In any case we will assume here that instabilities or gains are 
under control (~ a?) for frequencies larger than the spin frequency For smaller 
times we will typically use Eq.(5) as a mean value but bearing in mind that larger 
gains are also possible. 

In summary, because of the possible instrument gains, it is impotant to have 
a constant resolution of ~ gt — lmK over a large range of values (AT ~ IK) 
to be able to recover the underlaying signal after proper calibration. This could 
be partially done on board. A constant resolution indicates the need of a linear 
ADC, which with adequate compression (presented next) will be shown to be a 
good alternative to non-linear ADC. 

3. A SOLUTION TO THE PROBLEM 

In a separate paper (Romeo et al. 1998) we have presented a general study of 
(correlated multi-Gaussian) noise compression by studying Shannon entropies per 
componet h, and therefore the optimal compression c r ^ op t in Eq.(2). For a linearly 
discretized data with Ntus = log 2 N max bits, h in Eq.(l) depends only on the ratio 
of the digital resolution A to the effective rms noise, a e : 



dom) field Xf. Cij =< XiXj >. As mentioned in the section above, A in Planck only 
needs to be as small as the instrumental white noise ot- If the data is dominated 
by thermal instrumental noise we have a e ~ <jt — A and the optimal compression 
is simply: 



where we have use N^ts = 16 as planned for the Planck LFI. This is the maximum 
lossless compression that can be achieved for a well calibrated signal dominated 
by instrumental thermal (white) noise. This very large compression rate can be 
obtained because there is a large range of values ~ A2 Nbits which has a very small 
probability, and therefore can be easily compressed (e.g. by Huffman's or arithmetic 
coding) but can't be omitted because they are needed to calibrate the instrument 
gains (and to measure the point sources and the galaxy). In practice point sources 
and l/f noise will produce a e > ot and c r ^ op t < 8 (note that in general a e < ao, 
the one-point RMS fluctuation) and will also need to account for the fact that the 
galaxy and the point sources can't be represented by a multi-Gaussian, even if we 
allow for a different power spectrum. Lets do a more realistic case with l/f noise, 
but still without point sources or the galaxy. 
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Table 2. Compression factors for different instrumental noise. 



3.1 Compression with 1/f noise 

In the case of the power spectrum in Eq.(5), Romeo et al. (1998) find for the 
entropy: 
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where ho is the white noise (thermal) contribution [ho = log 2 (v / 27re or/ A)] and 
fmax and fmin are the maximum and minimum frequencies cover with the N mea- 
surements. In our case we are only interested in the contribution within a rev- 
olution (as we are assuming a good calibration at smaller frequencies) so that 

fmax — fsampl — 145 Hz and fmin — f spin — 1/60 Hz. 

Another case which can be of interest is: 
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Taking again as reference ho as the thermal case where fknee 
may write 
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and A' = A, we 
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In Table 2 we have presented some compression rate values corresponding to 
this entropy for different 'knee' frequencies. 

In order to quote a more realistic compression rate, it is crucial to have a detailed 
model of the instrument instabilities (i.e., what is the value of fknee frequency? 
what is the value of the white noise amplitude?), the detailed ADC model, the 
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on-board data and calibration strategy, the pointing and a detailed simulations of 
the sky. 



5. CONCLUSION 

Because of the possible instrument gains, it is impotant to have a constant resolution 
of ~ <tt — lmK over a large range of values (AT ~ IK). This indicates the 
convenience of a linear ADC. Although some compression can be achieved with 
non- linear ADC, in this case standard linear lossless data compression techniques 
seem safer (because of possible calibration drifts) and more efficient (because of the 
larger compression rates). 

The maximum lossless compression that can be achieved with a well calibrated 
signal is c r ~ 8 (with data of Nats = 16). Similar values can be obtained (§3.1) 
even for non-thermal instrumental noise. These results assume that the dominant 
component of data is multi-Gaussian (correlated) noise. Although this might be 
true for the mean instrumental noise and the CMB signal, it is not true for point 
sources or the galaxy. Nevertheless, a compression factor of e> ~ 3 can be easily 
obtained if we assume that at least ~ 75% of the data is drescribed by well-calibrated 
instrumental noise (the CMB is only relevant after averaging over many pixels). 

Thus, compression factors in the range c r ~ 3 — 7 are possible within the ap- 
proximation for the data structure we have considered. Compression of the raw 
data including the reference load (rj ~ 260 Kbs -1 ) does not seem possible with 
a telemetry rate of r t ~ 20 Kbs -1 , but it might be possible for r t ~ 40 Kbs^ 1 . 
An alternative is to process on-board some of the current redundancy by stucking 
nearby pixels to the level of the nominal resolution (FWHM/2.5), as indicated in 
Table 1. The actual compression can be achieved with standard Huffman's or arith- 
metic coding, although other possibilities can also be considered or taylored for this 
problem (see Romeo et al. 1998 for more details). 
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